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Preface 


This  report  was  prepared  by  the  Environmental  Laboratory  (EL)  of  the 
U.S.  Army  Engineer  Waterways  Experiment  Station  (WES),  as  part  of  the 
Water  Quality  Management  for  Reservoirs  and  Tailwaters  Demonstration 
of  the  Water  Operations  Technical  Support  (WOTS)  Program,  sponsored 
by  the  U.S.  Army  Corps  of  Engineers  (HQUSACE).  Mr.  Pete  Juhle, 
HQUSACE,  is  Technical  Monitor.  The  WOTS  is  managed  under  the 
Environmental  Resources  Research  and  Assistance  Programs  (ERRAP), 
Mr.  J.  Lewis  Decell,  WES,  Manager.  Dr.  A.  J.  Anderson  was  Assistant 
Manager,  ERRAP,  for  the  WOTS  program. 

This  report  was  prepared  by  Dr.  Robert  F.  Gaugush  of  the  Aquatic 
Processes  and  Effects  Group  (APEG),  EL,  under  ths  direct  supervision  of 
Dr.  Robert  H.  Kennedy,  APEG,  aiid  under  the  general  supervision  of 
Mr.  Donald  L.  Robey,  Chief,  Ecosystem  Research  and  Simulation  Divi* 
sion,  EL,  and  Dr.  John  Harrison,  Chief,  EL. 

At  the  time  of  publication  of  this  report.  Director  of  WES  was  Dr.  Rob¬ 
ert  W.  Whalin.  Commander  was  COL  Leonard  G.  Hassell,  EN. 

This  report  should  be  cited  as  follows: 

Gaugush,  Robert  F.  1993.  Sample  Design  Software  User’s 

Manual.  Instruction  Report  W-93-1.  Vicksburg,  MS:  U.S. 

Army  Engineer  Waterways  Experiment  Station. 
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1  Introduction 


Background 

The  Sampling  Design  Software  (SDS,  Version  2.0)  was  developed  as  a 
companion  to  the  Instruction  Report  “Sampling  Design  for  Reservoir 
Water  Quality  Investigations”  (Gaugush  1987).  Four  programs  were  de> 
veloped  to  assist  the  user  with  problems  with  sampling  design  and  its  eval* 
uation.  The  programs  aid  the  decision-making  process  in  sampling  design 
through  the  use  of  decision  matrices  (die  DECMATRX  program).  Sampling 
design  evaluation  is  performed  using  variance  component  analysis  (the 
VARCOM  program),  error  analysis  (the  ERROR  program),  and  cluster 
analysis  (the  CLUSTER  progn  m). 

The  purpose  of  this  user’s  manual  and  the  SDS  disk  provided  with  it  is 
to  assist  the  user  in  the  implementation  of  these  programs  and  is  not  in¬ 
tended  to  provide  instruction  on  the  assumptions  and  calculation  methods 
of  the  statistical  techniques  used  by  these  programs.  The  Bibliography 
presents  a  number  of  sources  for  basic  statistics,  sampling  design,  and 
more  advanced  statistical  topics.  The  instruction  report  mentioned  pre¬ 
viously  represents  an  introduction  to  the  topic  of  sampling  design.  An  in¬ 
troduction  to  statistics  ftom  a  reservoir  water  quality  perspective  can  be 
found  in  “Statistical  Methods  for  Reservoir  Water  Quality  Investigations” 
(Gaugush  1986). 


Contents  of  the  SDS  Disk 

A  total  of  39  files  are  provided  on  the  SDS  disk.  The  .EXE  files  are 
the  compiled  program  files  for  DECMATRX,  VARCOM',  ERROR,  and 
CLUST^.  These  programs  were  developed  and  compiled  using  Turbo 
Pascal  3.5  (Borland  International,  Copyright  1984, 1989).  The  program 
files  also  have  associated  help  files  (files  with  an  extension  of  .Hxx). 
Three  example  data  sets  are  provided  for  the  programs  VARCOM, 
ERROR,  and  CLUSTER.  These  data  sets  are  EG.VAR,  EG.ERR,  and 
EG.CLS,  respectively. 
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Some  files  are  required  for  all  of  the  programs.  The  files  with  an  exten¬ 
sion  of  .BGI  are  graphics  device  drivers.  Only  one  of  these  Hies  will  be 
used  for  any  particular  application,  but  all  are  provided  for  maximum  com¬ 
patibility  with  the  numerous  graphics  cards  to  be  found  in  personal  com¬ 
puters  (PC’s).  The  nies  with  an  extension  of  .CHR  are  graphics  character 
sets  that  are  used  in  the  introductory  screens  for  each  program.  These 
files  are  supplied  with  the  Turbo  Pascal  5.S  compiler  (Borland  Internation¬ 
al,  Copyright  1984,  1989). 

The  COLORS.DAT  flle  is  a  short  ASCII-format  text  file  that  is  read  by 
all  of  the  programs  to  set  the  screen  colors.  If,  after  running  the 
programs,  you  would  like  to  change  the  screen  colors,  then  simply  edit 
this  file.  Notes  on  color  selection  are  included  in  the  file. 

A  complete  listing  of  the  flies  on  the  SDS  disk  is  provided  lelow: 

Decision  Matrices  flies: 

DECMATRX.EXE  -  program  flle 

DECMATRX.H01  -  help  flies 

DECMATR^'.H02 

DECMATRX.H03 

DECMATRX.H04 

DECMATRX.H05 

Variance  Component  Analysis  files: 

VARCOM.EXE  -  program  flle 

VARCOM.H01  •  help  flies 

VARCOM.H02 

VARCOM.H03 

EG. VAR  -  example  data  flle 

Error  Analysis  files: 

ERROR.EXE  -  program  flle 

ERROR.H01  -  help  flies 

ERROR.H02 

ERROR.H03 

ERROR.H04 

ERROR.H05 

EO.ERR  •  example  data  flle 
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Cluster  Analysis  Hies; 


CLUSTER.EXE  -  program  file 


CLUSTER.HOO  -  help  files 

CLUSTER.HOl 

CLUSTER.H02 

CLUSTER.H03 

CLUSTER.H04 

CLUSTER.H05 

CLUSTER.H06 

CLUSTER.H07 

CLUSTER.H08 

CLUSTER.H09 

EG.CLS  •  example  data  file 


Files  used  for  all  programs: 


ATT.BGI  -  graphics  drivers 

CGA.BGI 

EGAVGA.BGI 

HERC.BGI 

IBM8514.BGI 

PC3270.BGI 

LITT.CHR  •  character  sets 
TRIP.CHR 

COLORS.DAT  •  data  file  for  setting  screen  colors 


Installation 

The  SDS  software  will  run  from  a  single  360K  5.2S-in.  floppy  disk  (the 
software  is  supplied  in  this  format),  but  performance  will  be  improved 
considerably  by  installing  the  software  on  a  hard  disk  drive. 

To  install  the  software  on  a  hard  disk: 

a.  Create  a  subdirectory  for  the  software 

MD  C:\SAMPLING 

b.  Copy  all  files  from  the  SDS  disk  to  the  new  directory 

CDNSAMPLING 
COPY  A;*.* 


Chapter  1  Introduction 


(The  above  examples  assume  that  your  C:  drive  is  a  hard  disk  and  that  the 
SDS  disk  is  in  drive  A:) 


Hardware  Requirements 

The  SDS  software  has  been  tested  on  a  number  of  different  PC  con* 
figurations.  Testing  has  included  8088  (basic  PC's),  80286  (AT  types), 
and  80386  machines.  Numeric  co-processors  are  not  required,  but  will  be 
used  if  present.  The  CGA,  EGA,  VGA,  and  Hercules  graphics  drivers  are 
support^. 


User  Assistance 

Please  contact: 

Robert  H.  Kennedy.  CEWES-2S-A 

U.S.  Army  Engineer  Waterways  Experiment  Station 

3909  Halls  Ferry  Road 

Vicksburg.  MS  39180-6199 

Telephone:  (6C1)  634-3659 

if  you  need  assistance  with  the  operation  of  the  SDS  software. 
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2  Decision  Matrices 


A  decision  matrix  is  an  aid  to  the  determination  of  sample  size  for  multi- 
variable  sampling  programs  and  can  be  used  for  either  simple  random  or 
stratified  random  sampliiig  designs.  The  decision  matrix  is  simply  a  tabular 
presentation  that  incorporates  the  factors  necessary  to  determine  sample 
size:  (a)  an  estimate  of  the  mean,  (b)  an  estimate  of  the  variability, 

(c)  desired  precision,  (d)  the  acceptable  probability  of  error,  and  (e)  the 
costs  associated  with  sampling.  See  Gaugush  (1987)  for  a  more  complete 
discussion  of  determining  sample  size  and  the  use  of  decision  matrices. 


Program  Execution 

To  tun  the  Decision  Matrices  program,  simply  type  “decmatnc”  at  the 
DCS  prompt.  Be  sure  your  default  directory  (i.e.,  the  directory  that  you 
are  in  when  you  entvr  the  above  command)  contains  all  of  the  files  on  the 
Sampling  Design  Software  disk. 

After  the  above  command  is  entered,  the  program  will  prompt  you  for 
all  of  the  necessary  inputs.  Program  flow  is  as  follows: 

a.  Introductory  screen. 

b.  Prompt  for  output  route  •  output  may  be  routed  to  either  the  screen 
only  or  to  a  disk  file  as  well  as  the  screen  (if  dick  file  output  is 
chosen,  the  program  will  prompt  for  a  rile  name). 

c.  Data  entry. 

d.  View  output. 

e.  Repeat  analysis  widi  new  data. 

/.  Exit  program. 
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A  documented  session  presented  below  provides  a  more  complete  view 
of  the  program  flow. 


Data  Entry 


DECMATRX  is  an  interactive  program  and  allows  you  to  enter  data 
during  the  execution  of  the  program.  TWo  data  entry  windows  are  used  to 
(a)  specify  the  parameters  to  be  used  by  the  program,  and  (b)  enter  es¬ 
timates  of  the  central  tendency  (i.e.,  the  mean)  and  dispersion  (i.e.,  the 
variance)  of  the  variables  to  be  sampled. 

In  the  first  data  entry  window,  six  Helds  are  highlighted  for  input.  (In 
the  representations  of  the  data  entry  windows  shown  below,  highlighted 
Helds  are  indicated  by  underlining  the  Held.)  In  the  first  Held  enter  the 
value  (from  1  to  6)  of  the  number  of  variables  to  be  used  in  the  decision 
matrix.  The  remaining  Helds  are  for  the  error  probabilities  and  the  levels 
of  precision  to  be  used  in  the  analysis.  Default  values  are  provided  for 
these  Helds,  but  they  can  be  changed  by  entering  the  desired  value  in  the 
respective  Held.  Five  possible  values  for  the  error  probability  are  suppoiied 
and  are  restricted  to  these  values  because  of  the  mediod  used  to  calculate  the 
t  statistic  in  the  program.  Values  for  precision  can  fall  anywhere  within  die 
speciHed  range  of  possible  values.  Generally,  you  will  only  need  to  specify 
tte  number  of  variables  because  the  default  values  for  error  probability  and 
precision  provide  a  wide  range  of  sample  sizes. 


The  arrow  keys  allow  movement  between  the  Helds.  The  right  and 
down  arrows  move  the  cursor  to  the  next  Held  while  the  left  and  up  ar¬ 
rows  move  the  cursor  to  the  previous  Held,  lypographical  errors  within  a 
Held  can  be  coirected  by  using  the  backspace  key  to  delete  the  error  and 
then  retyping  the  Held.  Errors  can  also  be  corrected  after  leaving  the  Held 
that  contains  the  error,  but  in  this  case  the  entire  Held  must  be  retyped. 
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The  second  data  entry  window  consists  of  four  Helds  for  each  of  the  n 
variables  speciHed  in  the  first  window.  The  example  shown  below  as¬ 
sumes  that  the  analysis  is  to  be  performed  on  three  variables.  As  shown,  a 
name,  mean,  coefHcient  of  variation  (C.V.),  and  cost  must  be  speciHed  for 
each  variable.  As  before,  the  arrow  keys  allow  for  movement  between  the 
fields.  Variable  names  can  contain  any  characters  (uppercase  or  lower¬ 
case,  numbers  may  also  be  used),  but  blank  spaces  are  not  allowed  in  vari¬ 
able  names.  Decimal  points  are  not  required  in  the  remaining  Helds  but 
should  be  used  for  claHty.  Values  for  the  C.V.’s  are  expressed  as  a  decimal 
fraction  and  not  as  a  percentage.  For  example,  the  CV.  would  be  expressed 
as  0.50,  not  as  50.0  percent,  for  a  variable  with  a  mean  of  50.0  and  a  stand 
ard  deviation  of  25.0. 


Error  Messages 

As  the  data  are  entered  into  the  program,  DECMATRX  checks  for  er¬ 
rors.  The  program  checks  the  Helds  for  number  of  variables,  error  prob¬ 
ability,  and  precision  for  nonnumeric  characters.  If  any  are  found, 
DECMATRX  will  issue  one  of  the  following  error  messages: 

IHPOT  EKROR:  NUMBER  OF  VARIABLES  INCCRRECTLY  ENTERED 
ZNFOT  ERROR:  ERROR  PROBABILITY  INCORRECTLY  ENTERED 
INPUT  ERROR:  PRECISION  INCORRECTLY  ENTERED 

The  program  also  checks  these  same  Helds  to  determine  if  the  values 
entered  are  within  the  range  of  values  supported  by  the  program.  If  any 
fall  outside  of  the  range  of  sui^orted  values,  the  program  will  issue  one  of 
the  following  messages: 

INPUT  ERROR:  NUMBER  OF  VARIABLES  IS  OUT  OF  RANGE 
INPUT  ERROR:  ERROR  PROBABILITY  IS  OUT  OF  RANGE 
INPUT  ERROR:  LEVEL  OF  PRECISION  IS  OUT  OF  RANGE 
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The  second  data  entry  window  is  also  checked  for  errors.  If  a  C.V.  is  less 
than  or  equal  to  zero,  DECMATRX  reports: 

IMPOT  KMtOR:  C.V.  <>  0 

If  a  sampling  cost  is  entered  as  a  negative  number,  then  the  program  is¬ 
sues  the  following  error  message: 

IHPOT  ERROR:  COST  <  0 

If  any  nonnumeric  characters  are  entered  for  any  of  the  means,  C.V.’s, 
or  costs,  then  one  of  the  following  messages  will  be  displayed: 

IHPOT  ERROR:  MEAN  INCORRECTLY  ENTERED 

INPOT  ERROR:  C.V.  INCORRECTLY  ENTERED 

INPOT  ERROR:  COST  INCORRECTLY  ENTERED 

Pressing  any  key  after  an  error  message  has  been  reported  will  return 
the  program  to  the  data  entry  screen  with  the  error.  Correct  the  error  and 
continue. 


Documented  Session 

This  example  session  with  DECMATRX  uses  the  following  data: 


Variable 

Mean 

C.V. 

Cost 

TP 

95. 

0.56 

25.0 

TN 

1614. 

0.28 

25.0 

CHLA 

35. 

0.52 

25.0 

The  object  of  the  analysis  is  to  determine  sample  sizes  and  costs  as¬ 
sociated  with  sampling  these  three  variables  over  an  annual  period. 

Sample  sizes  and  costs  for  each  variable  are  presented  with  respect  to 
error  probability  and  precision.  The  results  of  the  analysis  can  be  used  to 
develop  a  sampling  design  within  both  statistical  and  Hnancial  constraints. 
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After  pressing  any  key,  the  program  prompts  for  the  output  route. 


Chapter  2  Dadsicii  Matrices 


Select  2  (disk  file  output).  DECMATRX  then  prompts  for  the  output  file 
name.  Use  MATRIX.OUT  for  this  session. 


DECMATRX  then  displays  the  first  data  entry  window.  (Underlined 
fields  represent  fields  that  will  be  highlighted  on  the  PC  screen). 
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Press  FI  for  help. 


DECISION  MATRIX 


VARIABU  NAME  MEAN  C.V.  OMIT  COST 

1  __________  ________  ________  ____ 

2  ________  __________  _______  ___ 

B.lp  -  Data  input  «i— — 

Entar  data  In  aaeh  of  th.  hiqh-llghtad  fialda.  Provld.  a  naaia,  Man, 
coofflclant  of  variation,  and  laapilnq  coat  for  aach  variabla.  Tha 
aaapllnq  costs  ara  usually  analytical  costs  par  saapla.  If  costs  ara 
not  an  issue,  slsiply  antar  a  1  for  Cha  cost  for  each  variabla. 


To  Bova  batvta,!  r raids: 


laft  or  up  arrow  -  pravlous  flald 
right  or  down  arrow  -  naxt  flald 


■■III  n  -  Help  n  -  Contlnu- 


r2  -  Continua 


£:£*;t;£;t;£:i:;£;£ 


Press  F2  to  continue  and  clear  the  help  window.  DECMATRX  returns  to 
the  data  entry  window.  Enter  data  to  produce  the  screen  shown  below. 


DCCZSZON  lurszx 


ri  -  H«lp— F2  -  Contlnu 


VARIABLE 

NAME 

MEAN 

C.V. 

OMIT 

1 

TP 

SS. 

0.56 

25 

2 

TN 

1614. 

0.2S 

25 

3 

CBLA 

35. 

0.52 

25, 

|£'t:£;t!t;£;£;£:£;£;£;a£iti£;£i£;Kj 
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When  data  entry  is  completed,  press  F2  to  continue.  The  progr^  displays 
sample  sizes  with  respect  to  variable,  error  probability,  and  precision. 


SAMPLE  SIZE 

MtCCZSIONi 

0.10 

tMOR: 

O.OS 

0.10 

0.20  0. 

VARIABU 

Press  FI  for  help. 


SMffU  Size 


PRECZSlOMt 

0.10 

0.20 

EMtORi 

0.05 

0.10 

0.20 

0.05 

0.10 

0.20 

VARIABLE 

TP 

123 

•7 

S3 

33 

23 

14 

TR 

33 

23 

14 

10 

7 

4 

C8U 

IOC 

75 

4C 

20 

20 

12 

1  SA>Pl«  (lx**  Ar*  provldAd  ter  vach 

eoablnation  of 

vaclablo. 

•rror 

1  prob4toilityf 

and  praeislon. 

Kn  n  - 

Contlauo 
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Press  F2  to  continue  and  clear  the  help  window.  Press  F3  to  see  the  costs 
window. 


I 

I 


COST 

PRECXSZON; 

C.IO 

0.20 

CPROR: 

0.05 

0.10 

0.20 

0.05 

0.10 

0.20 

VARIABU 

TP 

3075 

2175 

1325 

825 

575 

350 

T» 

125 

575 

350 

250 

175 

100 

CBU 

2650 

1875 

1150 

700 

500 

300 

I 


g:  ri  •  H«lp  r2  -  Exit  r3  -  Sxnple  size  g 

ifit  i>  I*  »y I*  »*«»*«»** lift* 


Press  FI  for  help. 


I  COST  I 


PRECISION  t 

0.10 

0.20 

ERROR: 

0.05 

0.10 

0.30 

0.05 

0.10 

0.3C 

VARIABLE 

TP 

3075 

2175 

1335 

825 

575 

350 

TN 

825 

575 

350 

250 

175 

100 

CHLA 

p—  Kelp  -  3 

2650 

Isapllnt 

1879 

cotta  mmmm 

1150 

700 

500 

300 

Suplln^  costs  srs  provldsd  Tor  osch  coablastlon  of  vazlablo,  srzor 
pzolMliility,  and  precision. 


F2  -  Continue 


g:  n  -  Help  r2  -  Exit  r3  -  Ssnple  size  S 


ChaptarZ  Decision  Matrices 
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Press  F2  to  continue  and  clear  the  help  window.  'Any  time  after  data  entry, 
F3  allows  switching  between  the  sample  size  and  cost  windows.  Press  F3 
to  return  to  the  sample  size  window. 


1 

SJUffLE 

SIZE 

1  rSECISIOH! 

0.10 

0.20 

1  ERROR: 

o.os 

0.10 

0.20 

O.OS 

0.10 

0.20 

1  VARIMLE 

i  ” 

123 

n 

S3 

33 

23 

14 

1  TM 

33 

23 

14 

10 

7 

4 

1  cau 

IOC 

2S 

4C 

20 

20 

12 

1  .  P 

s  n  -  R«ip  rj  -  exit  rs  -  cost*  % 


At  this  point,  you  can  either  repeat  the  program  with  new  data  or  exit  the 
program.  Respond  with  '*N”  to  end  the  documented  session. 
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Example  Output  File 


DECISION  MATRIX 


INPUT  DATA 


ERROR  PROBABILITIES 

1  0.10 

0.20 

LEVELS  OF 

PRECISION 

1  o.os 

0.10  0.20 

VARIABLE 

MEAN 

C.V. 

UNIT  COST 

TP 

S.SO0E401 

S.SOOE'Ol 

2.S00E401 

TN 

1.614E«03 

2.BOOE*01 

2.SOOE4^01 

CHLA 

S.SOOE-^Ol 

S.200E-01 

2.S00Et01 

SAMPLE  SIZE 


PRECISION) 

0.10 

0.20 

ERROR) 

O.OS 

0.10 

0.20 

O.OS 

0.10 

0.20 

VARIABLE 

TP 

123 

87 

S3 

33 

23 

14 

TN 

33 

23 

14 

10 

7 

4 

CBLA 

106 

75 

46 

28 

20 

12 

COST 

PRECISION) 

0.10 

0.20 

ERROR) 

0.05 

0.10 

0.20 

O.OS 

0.10 

0.20 

VARIABLE 

IP 

30TS 

2175 

1325 

825 

575 

350 

TN 

I2S 

575 

350 

250 

175 

100 

CBLA 

2650 

1875 

1150 

700 

500 

300 

CtMpt«r2  DacWon  Matricas 


3  Variance  Component 
Analysis 


Variance  component  analysis  is  a  technique  for  quantifying  the  sources 
of  variability  in  the  data  resulting  from  a  given  sampling  design.  The 
analysis  results  in  the  determination  of  each  design  component’s  contribu- 
ti(Mi  to  the  overall  variance.  Ba.«ed  on  these  results,  sampling  effort  allo¬ 
cated  to  a  given  component  of  tho  design  could  be  reduced  or  eliminated. 

See  Winer  (1971)  for  a  comprehensive  treatment  of  variance  component 
analysis. 


Data  Set  Preparation 

The  VARCOM  program  requires  that  input  data  sets  be  prepared  prior 
to  its  use  (i.e.,  data  input  during  the  program  is  not  available).  Data  sets 
can  be  prepared  with  most  text  editors  and  word  processing  software.  The 
data  sets  may  contain  only  ASCII  characters  and  none  of  the  special  char¬ 
acters  used  by  most  word  processors  for  formatting.  If  you  use  a  word 
processor  to  generate  your  data  sets,  be  sure  to  save  the  Hies  in  DOS  or 
ASCn  format 

Data  in  VARCOM  input  files  are  organized  into  four  groups: 

Group  1  -  title 

Group  2  -  problem  size  identifiers 
Group  3  •  factor  and  level  information 
Group  4  -  data  records 
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An  example  data  set,  EG. VAR,  is  provided  on  the  SDS  distribution  dis 
kette  and  is  shown  below: 


Data  Group  1 
Data  Group  2 


Data  Group  3 


66.92 

69.69 

69.97 

68.51 

4.77 

6.2) 

4.38 

3.88 

46.13 

39.85 

44.17 

46.45 

3.37 

3.38 
6.11 
4.71 

57.28 

48.00 

59.71 

58.39 

3.50 

3.70 

8.64 

6.47 


Dau  Group  4 


EAU  GXU.E  -  CHLOROrKYLL  •  HAY  1981 

3  24  - - 

STATION 

DAY 

DEPTH 


10 

20 

30 

5 

19 

0 

1 

2 

3 

10 

10 

10 

10 

10 

10 

10 

10 

20 
20 
20 
20 
20 
20 
20 
20 
30 
30 
30 
30 
30 
30 
30 
30 


5 

5 

5 

5 

19 

19 

19 

19 

5 

5 

5 

5 

19 

19 

19 

19 

5 

5 

5 

5 

19 

19 

19 

19 


Data  Group  1  consists  of  a  single  line  specifying  a  title  for  the  data  set 
(maximum  of  60  characters).  Data  Group  2  is  a  single  line  with  tv/o 
items.  The  Hrst  is  the  number  of  factors  in  the  data  set  (VARCOM  allows 
a  maximum  of  three  factors),  and  the  second  indicates  the  number  of  ob¬ 
servations  in  Data  Group  4.  Data  Group  3  names  the  factors,  specifies  the 
number  of  levels  for  each  factor,  and  provides  the  name  for  each  of  the 
levels.  A  maximum  of  100  levels  is  supported  by  VARCOM.  In  the  example 
data  set,  three  factors  are  specified  in  Data  Group  2.  The  three  factors  us^ 
in  the  example  data  set  are  STATION,  DAY,  and  DEPTH.  STATION  has 
three  levels  (10, 20,  and  30)  which  means  that  three  stations  were  sampled. 
DAY  has  two  levels  (samples  were  taken  on  the  3th  and  the  19th  of  May). 
Depth  has  four  levels  (samples  were  taken  at  1-m  intervals  from  the  sur¬ 
face  to  3  m).  Data  Group  4  lists  the  value  o/  the  variable  to  be  analyzed 
(chlorophyll  a  in  the  example  data  set)  for  each  combination  of  the  fac¬ 
tors.  For  example,  at  station  10  on  the  3th  of  May  at  a  depth  of  1  m,  the 
chlorophyll  a  concentration  was  69.89  fig/l  (second  line  of  Data  Group  4). 

VARCOM  requires  that  the  data  in  Data  Groups  3  and  4  be  placed  in 
specific  columns.  A  portion  of  Data  Group  3  wiA  column  identifiers  is 
shown  below. 
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1234Se78}01234S67890I234S67690  -  Colunin  numbCTS 

STATION  3  - -  Number  of  levels 

t  10  1 

I  20  h—  Level  namei 

I  30  — I 


1—  Factor  Dime 


A  factor  name  can  have  a  maximum  of  20  characters  and  must  begin  in 
column  I  (i.e.,  factor  names  must  be  left-justified).  Separate  the  factor 
name  and  the  number  of  its  levels  by  one  blank  space.  Therefore,  the 
value  for  the  number  of  levels  should  begin  in  column  22  or  greater.  A 
level  name  (in  the  following  row)  can  have  a  maximum  of  IS  characters 
and  must  end  in  column  IS  (i.e.,  all  level  names  must  be  right-JustiHed). 

A  portion  of  Data  Group  4  with  column  identifiers  is  shown  below. 

1  2  3  4  8  6 

133486789012348678901234567890123486789012348678901234567890  -  Coiumn  numbcrt 

10  8  .  0  66.92  - 1 

10  81  69.89  I 

10  8  2  69.97  I—  ViiiaUe  dstt  (chlorophyll  a) 

10  5  3  68.81  - 1 


I  i  Level  3  names  (depths) 


i  Levd  2  names  (dates) 


' —  Level  1  names  (statioas) 


Level  1  names  must  end  in  column  IS,  level  2  names  end  in  column  30, 
and  level  3  names  end  in  column  45.  At  least  one  blank  column  must 
separate  the  last  level  name  from  the  variable  data. 
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A  data  set  for  a  two  factor  variance  component  analysis  would  appear 
as  follows: 


EAU  GAU.E  •  CHLOROPHYLL  -  HAY  1981 
2  24 

STATION  3 

10 
20 
30 

DAY  2 

5 
19 


10 

5 

68.92 

10 

S 

69.69 

10 

5 

69.97 

10 

5 

68. SI 

10 

19 

4.77 

10 

19 

6.23 

10 

19 

4.38 

10 

19 

3.86 

20 

S 

46.13 

20 

S 

39.85 

20 

s 

44.17 

20 

5 

46.45 

20 

19 

3.37 

20 

19 

3.38 

20 

19 

6.11 

20 

19 

4.71 

30 

S 

57.28 

30 

S 

48.00 

30 

s 

59.71 

30 

s 

58.39 

30 

19 

3.50 

30 

19 

3.70 

30 

19 

8.64 

30 

19 

6.47 

Note  that  multiple  observations  for  combinations  of  levels  are  allowed. 
In  the  above  data  set,  there  are  four  observations  for  each  combination  of 
station  and  day.  It  also  important  to  note  that  the  order  of  lines  in  Data 
Group  4  is  not  important.  The  above  data  set  could  be  just  as  correctly 
specified  as: 


BAU  GAUX  -  CBLOrOPRYLL  -  NAY  1981 
2  24 

STATION  3 

1  10 

20 
30 

Day  2 

9 

‘  19 


10 

5 

66.92 

10 

5 

69.89 

10 

5 

69.97 

10 

5 

68.51 

20 

5 

46.13 

20 

5 

39.85 

20 

5 

44.17 

20 

5 

46.45 

30 

5 

57.28 

30 

9 

48.00 

30 

5 

59.71 

30 

5 

98.39 

10 

19 

4.77 

10 

19 

6.23 

10 

19 

4.38 

10 

19 

3.88 

20 

19 

3.37 

20 

19 

3.38 

20 

19 

6.11 

20 

19 

4.71 

30 

19 

3.50 

30 

19 

3.70 

30 

19 

8.64 

30 

19 

6.47 
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As  long  as  the  level  names  and  the  variable  data  on  each  line  are  placed 
in  the  proper  position,  then  the  lines  of  Data  Group  4  can  be  arranged  in 
any  convenient  order.  A  one  factor  data  set  would  appear  as  follows: 


UU  CALLE  -  CHLOROPHYU.  -  HAY  1981 
1  24 

DAY  2 

S 


19 

9 

68.92 

S 

69.89 

5 

69.97 

5 

68.51 

19 

4.V7 

19 

6.23 

19 

4.36 

19 

3.88 

S 

46.13 

S 

39.85 

5 

44.17 

S 

46.45 

19 

3.37 

19 

3.38 

19 

6.11 

19 

4.71 

S 

57.28 

S 

48.00 

5 

59.71 

S 

58.39 

19 

3.50 

19 

3.70 

19 

8.64 

19 

6.47 

Suggestion:  use  an  extension  of  .VAR  for  VARCOM  data  files.  This  will 
distinguish  them  from  other  data  Hies. 


Program  Execution 

To  run  the  Variance  Component  Analysis  program,  simply  type  ‘War- 
corn"  at  the  DOS  prompt.  Be  sure  your  default  directory  (i.e.,  the  direc* 
tory  that  you  are  in  when  you  enter  the  above  command)  contains  all  of 
the  files  on  the  Sampling  Design  Software  disk. 

After  the  above  command  is  entered,  the  program  will  prompt  you  for 
all  of  the  necessary  inputs.  Program  flow  is  as  follows: 

a.  Introductory  screen. 

b.  Prompt  for  output  route  •  output  may  be  routed  to  either  the  screen 
only  or  to  a  di^  file  as  well  as  the  screen  (if  disk  flle  output  is 
chosen,  the  program  will  prompt  for  a  file  name). 

c.  Prompt  for  input  file  name. 

d.  View  output. 

e.  Repeat  analysis  with  new  data. 
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/.  Exit  program. 

A  documented  session  presented  below  provides  a  more  complete  view 
of  the  program  flow. 


Error  Messages 

After  prompting  for  the  input  and  output  file  names,  VARCOM  per¬ 
forms  an  error  check  on  the  input  data  set.  If  the  data  set  specifies  more 
than  three  factors  for  the  analysis,  the  program  reports: 

ERKOR:  NUMBER  OF  FACTORS  EXCEEDS  MAX.  FACTORS 

If  the  number  of  levels  for  any  of  the  factors  exceeds  100,  the  following 
error  message  is  reported: 

ERROR:  NUMBER  OF  LEVELS  FOR  FACTOR  1 

EXCEEDS  THE  MAX.  NUMBER  OF  LEVELS 

If  the  number  of  observations  is  greater  than  3,500,  VARCOM  reports: 

ERROR:  NUMBER  OF  OBSERVATIONS  EXCEED.'!  MAXIMUM 

If,  for  any  factor,  the  number  of  level  names  does  not  agree  with  the 
names  listed,  the  program  provides  the  following  error  message: 

ERROR:  LEVEL  ID  NOT  FOUND 

VARCOM  terminates  after  reporting  any  of  the  above  error  messages. 
Edit  the  input  data  file  and  run  the  program  again. 


Documented  Session 


This  example  session  with  VARCOM  uses  the  EG.VAR  data  set 
provided  on  the  SOS  distribution  diskette.  These  data  were  derived  from 
studies  conducted  on  Eau  Galle  Reservoir  in  west-central  Wisconsin.  The 
data  set  has  three  factors:  STATION,  DAY,  and  DEPTH.  STATION  has 
three  levels  (stations  10, 20,  and  30),  DAY  has  two  levels  (the  5th  and 
19th  of  May),  and  DEPTH  has  four  levels  (depths  of  0, 1, 2,  and  3  m). 

The  object  of  the  analysis  is  to  determine  the  distribution  of  the 
variance  in  chlorophyll  a  among  the  three  factors.  If  all  of  the  factors  ac¬ 
count  for  a  significant  fraction  of  the  variance  in  chlorophyll  a,  then  the 
sampling  design  is  efficient.  If,  on  the  other  hand,  one  or  two  of  the  fac¬ 
tors  account  for  most  of  the  variance,  then  the  sampling  effoit  could  be 
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reduced.  The  sampling  design  could  be  modified  to  include  only  those  faC' 
tors  that  explain  the  majority  of  the  variance. 

Entering  the  command  “VARCOKT*  at  the  DOS  prompt  begins  the 
program. 


After  pressing  any  key,  the  program  prompts  for  the  output  route. 
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Press  F2  to  continue  and  clear  the  help  window. 
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Select  2  (disk  file  output).  VARCOM  then  prompts  for  the  output  file 
name.  Use  EG.OUT  for  this  session. 
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VARCOM  then  displays  the  results  of  the  variance  component  analysis. 


I 

I  VARIANCE  COKFONENT  ANALYSIS 


EAU  GALLE  -  CBLOROPBYLL 

-  MAY 

1981 

SOURCE 

DF 

SS 

MS 

STATION 

2 

(.30C+02 

3.1SE+02 

DAY 

1 

1.58E404 

1.58E+04 

DEPTB 

3 

4.S2Et01 

1.51E+01 

ERROR 

17 

4.91002 

4.07E401 

CORRECTED  TOTAL 

23 

1.72C404 

VARIANCE  COMPONENT 

ESTIMATE 

PERCENT  TOTAL 

VAR (STATION 

) 

3.43C+01 

2.47 

VAR (DAY 

) 

1.31E403 

94.90 

VAR (DEPTH 

) 

•4.27E^00 

<  .01 

VAR  (ERROR) 

4.07E+01 

2.94 

Press  FI  for  help. 


g  VARIANCE  COMPONENT  ANALYSIS 

EAU  GALLE  -  CHLOROPBYLL 

SOURCE  or 


MAY  ISSl 


8S 


STATION  2  C.30E«02 

Help  •  Variance  coapo<>*RE  analyalt 


MS 

3.1SB«02 


Output  la  divided  Into  t«o  aectlona.  The  upper  section  of  the 
window  provides  the  output  of  an  n-way  analysis  of  variance. 

The  'Source*  eolusm  lists  the  sources  of  variability  within  the  data 
sat.  The  *OF*  eolusm  provides  the  dauraes  of  fraedoai  for  each  of  the 
sources.  The  susi  of  squares  and  the  eean  square  error  are  qlvan  In 
the  *SS*  and  *MS*  colunns.  respectively.  The  lower  section  of  the 
output  lists  the  variance  co^ionant  esclaatas  and  the  relative 
contribution  of  each  source  to  the  overall  variance. 


T2  -  Continue 


ri  -  Belp  F2  -  Exit  g 
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I 


Press  F2  to  continue  and  clear  the  help  window. 


i 

I  VARIANCE  COMPONENT  ANALYSIS 

I  CAU  GALLE  -  CHLOROPBYLL 

-  MAY 

isei 

1 

1  SOURCE 

DF 

ss 

MS  1 

I  STATION 

2 

C.30B«03 

3.15E402  1 

i  my 

1 

l.SSB«04 

l.SaE404  g 

1  DEPTH 

3 

4.52B401 

l.SlE+01  g 

£  ERROR 

17 

<.»lt+02 

4.07Em  || 

1  CORRECTED  TOTAL 

23 

1.72E4^04 

1 

1  VARIANCE  COMPONENT 

ESTIMATE 

PCRCCNT  TOTAL  | 

ft: 

1  VAR  (STATION 

) 

3.43E401 

2.47  1 

E  VAR (DAY 

) 

1.3m03 

94.90  1 

g  VAR (DEPTH 

) 

-4.27E+00 

<  .01  E 

1  VAR (ERROR) 

4.07E«01 

2.94  5| 

ii 

1  n  -  Help  FJ  -  Exit 

I 

The  variance  component  analysis  indicates  that  most  of  the  variance  (al¬ 
most  95  percent)  is  explained  by  sampling  date  (the  DAY  factor).  For  this 
data  set,  sampling  stations  and  dates  account  for  less  than  3  percent  of  the 
total  variance.  Press  F2  to  exit 
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Example  Output  File 


VARXAKCt  COMPONENT  ANALYSIS 


EAU  SALLE  •  CHLOROPHYLL 


HAY  1981 


Title 


SOURCE 

STATION 

DAY 

DEPTH 

ERROR 

CORRECTED  TOTAL 


VARIANCE  CUMPONENT 

VAR (STATION 
VAR (DAY 
VAR (DEPTH 
VAR  (ERROR) 
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4  Error  Analysis 


Error  analysis  is  a  statistical  technique  that  can  be  used  to  improve  an 
existing  sampling  design  that  uses  the  observed  distribution  of  variance  to 
redefine  the  sampling  design.  The  results  of  the  error  analysis  are  used  to 
redistribute  samples  to  the  existing  strata  to  produce  the  minimum 
variance  about  the  mean.  The  technique  can  be  applied  to  the  data  of  a 
stratified  sampling  design  or  to  the  data  from  a  simple  random  or  a  sys> 
tematic  sample  that  has  been  subjected  to  poststratihcation  (i.e.,  defining 
strata  a  posteriori).  See  Gaugush  (1987)  for  a  more  detailed  description 
of  stratified  sampling  and  the  use  of  error  analysis. 


Data  Set  Preparation 

The  ERROR  program  requires  that  input  data  sets  be  prepared  prior  to 
its  use  (i.e.,  data  input  during  the  program  is  not  available).  Data  sets  can 
be  prepared  with  most  text  editors  and  word  processing  software.  The 
data  sets  may  contain  only  ASCII  characters  and  none  of  the  special  char¬ 
acters  used  by  most  word  processors  for  formatting.  If  you  use  a  word 
processor  to  generate  your  dau  sets,  be  sure  to  save  the  files  in  DOS  or 
ASCII  format. 

Data  in  ERROR  input  files  are  organized  into  four  groups; 

Group  1  -  title 

Group  2  -  problem  size  identifler 
Group  3  -  strau  weights 
Group  4  -  data  records 
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An  example  data  set,  EG.ERR,  is  provided  on  the  SDS  distribution 
diskette  and  is  shown  below: 


EAU  SALU  -  19B1  -  STATION  20 


4  - 

1  .167  - - 1 


1  7.874SE+01 
1  I.673SE402 
1  4.2722E'»0I 

1  4.3925E+00 

2  S.IS33E'»00 

3  2.7610E4^01 
2  2.9570E+01 
2  5.7273E+01 
2  4.2378E401 
2  4.0602E401 
2  5.S306E+01 
2  6.5534E401 

2  5.21S8E401 

3  3.146SE401 
3  2.4320E401 
3  4.0684E401 

3  3.1248E401 

4  1.3363E*01 
4  1.8966E+01 
4  1.0322E401 
4  2.8420E+00 
4  3.S07SE'»00 
4  8.23OOE'fO0 
4  2.8S75E*01 
4  2.S618E401 


Dtu  Group  1 
Dau  Group  2 

l>tu  Group  3 


'Dau  Group  4 


Data  Group  1  consists  of  a  single  line  for  the  title  of  the  data  set  (maxi* 
mum  of  60  characters).  Data  Group  2  also  is  a  single  line  that  specifies 
the  number  of  strau  in  the  data  set  The  ERROR  program  supports  a  maxi¬ 
mum  of  25  strata.  Data  Group  3  specifies  the  strata  numbers  and  weights. 
The  strata  numbers  must  be  in  numerical  order  and  start  with  1.  The 
strata  weights  must  sum  to  1.00.  At  least  one  blank  space  must  separate 
the  stratum  number  and  stratum  weight  in  Data  Group  3.  Data  Group  4 
lists  the  observations  of  the  sample  data  set  consisting  of  the  stratum  num¬ 
ber  and  the  value  of  the  variable  (separated  by  at  least  one  blank  space). 
(Note:  Although  the  example  data  set  uses  the  computer  re|»esentation  of 
scientific  notation  (i.e.,  2.5618E-t01  is  the  computer  form  of  2J618  x  10*) 
for  the  data  values,  this  is  not  required.  These  numbers  could  have  been 
entered  in  a  more  typical  decimal  notation.) 

Suggestion:  use  an  extension  of  .ERR  for  ERROR  data  files.  This  will 
distinguish  them  from  other  data  Hies. 


Program  Execution 

To  run  the  Error  Analysis  program,  simply  type  ‘‘error”  at  the  DOS 
prompt.  Be  sure  your  default  directory  (i.e.,  the  directory  that  you  are  in 
when  you  enter  the  above  command)  contains  all  of  the  Hies  provided  on 
the  Sampling  Design  Software  disk. 
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After  the  above  command  is  entered,  the  program  will  prompt  you  for 
all  of  the  necessary  inputs.  Program  flow  is  as  follows: 

a.  Introductory  screen. 

b.  Prompt  for  output  route  -  output  may  be  routed  to  either  the  screen 
only  or  to  a  disk  Hie  as  well  as  the  screen  (if  disk  flle  output  is 
selected,  the  program  will  prompt  for  a  disk  file  name). 

e.  Prompt  for  input  flle  name. 

d.  View  output. 

e.  Repeat  analysis  with  new  data. 

/.  Exit  program. 

A  documented  session  presented  below  provides  a  more  complete  view 
of  program  flow. 


Error  Messages 

After  prompting  for  the  input  and  output  flle  names,  ERROR  performs 
an  error  check  on  the  input  data  set  If  the  data  set  specifies  more  than 
25  strata  for  the  analysis,  the  program  reports: 

aaaoa  t  womber  or  strata  ixcxxos  maximum 

If  the  strata  weights  do  not  sum  to  1.00,  the  following  error  message  is 
reported: 

ERROR  I  MEICBTS  DO  HOT  SUM  TO  I.OO 

ERROR  reports  the  following  message  if  any  of  the  strata  have  less 
than  three  observations: 

ERROR  I  LESS  TBAH  3  SAMPLES  IM  STRATUM  1 


Documented  Session 

This  example  execution  of  ERROR  uses  the  EG.ERR  data  set  provided 
on  the  SDS  distributiGn  diskette.  These  data  were  derived  flom  studies 
conducted  on  Eau  Galle  Reservoir  in  west-cectral  Wisconsin.  Composite 
epilimnetic  samples  for  chlorophyll  a  were  taken  at  approximately  2-week 
intervals  at  Station  20  (a  station  located  at  the  deepest  part  of  the  lake). 


\  ( 
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The  data  were  stratified  a  posteriori  into  four  strata:  spring,  summer,  fall, 
and  winter.  The  strata  were  deHned  as  follows:  “1”  for  spring  -  April  and 
May  (61  days),  “2”  for  summer  -  June,  July,  August,  and  September  (122 
days),  “3”  for  fall  -  October  and  November  (61  days),  and  “4”  for  winter  - 
December,  January,  and  February  (121  days).  Strata  weights  were  calcu¬ 
lated  by  dividing  the  number  of  days  in  the  stratum  by  365. 

The  object  of  the  analysis  is  to  determine  if  the  sampling  design  can  be 
improved  through  the  use  of  a  stratiHed  design  using  an  optimal  allocation 
of  samples  to  the  strata.  Error  analysis  calculates  the  error  variance  as¬ 
sociated  with  existing  distribution  of  samples  and  determines  an  optimal 
distribution  based  on  the  observed  variance  among  strata.  If  the  existing 
and  the  optimal  distribution  of  samples  are  considerably  different,  the  sam¬ 
pling  design  can  be  improved  by  adopting  the  optimal  distribution. 

Entering  the  command  ‘‘ERROR”  at  the  DOS  prompt  begins  the  program. 
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After  pressing  any  key,  the  program  prompts  for  the  output  route. 


I  S«X«et  output  rout# 


1)  Seroon  only 


Press  FI  for  help. 
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Press  F2  to  continue  and  clear  the  help  window, 


Select  2  (disk  file  output).  ERROR  then  prompts  for  the  output  file  name. 
Use  EG.OUT  for  this  session. 


- 


1- 


/ 


/ 


The  program  then  prompts  for  the  input  file  name.  Use  EG.ERR  for  this 
session. 


/ ' 
1^' 


ERROR  then  displays  the  statistics  for  the  stratified  sample. 


i, 

y 

/  / 
/ 

/■ 
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Press  FI  for  help. 

I  I 

fi  EAU  GAUX  -  1981  •  STATION  20  g 


STRATIFIED  SAMPLE  STATISTICS 


MEAN 
VARIANCE 
ERROR  VARIANCE 


3.62E+01 

1.8SE+02 

3.ilZ*01 


Help  •  Strttlflad  ■upl«  atatlatle* 


StAtlstlei  (Man,  varlanen,  and  atror  varianea)  for  tha  atratiflad 

aaapla. 


f2  -  Contioua 


^  Pl-Halp  F2-Sanpla  Stat  F3-Strata  Stat  F4-Analy8l9  F5-Exlt  ill 


Press  F2  to  continue  and  clear  the  help  window. 
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1 


Press  F3  to  see  the  strata  statistics. 


Press  FI  for  help. 

no  6AUI  -  1981  -  STATIOH  20 


8TMTA  STATISTICS 
STMTUH  N 

1  4 

2  9 

3  4 

4  I 

B«lp  -  Strata  atatiatlea 


MEAK 

I.SSB'fOl 

4.18E«01 

S.lSBi'Ol 

l.SSE-^Ol 


VAKIANCE 

4.85Et03 

3.42Et02 

4.51E^01 

9.34E'f01 


EMOR  VARIANCE 

1.2m03 

3.S0E«01 

1.13E^01 

1.17E«^01 


Statiatica  (nuaibar  ef  aaaiplaa>  aaan,  varianea,  and  arror  varlanea) 
Ter  aaeh  of  tha  aaaplad  atrata. 


r2  •  Contiaua 


Lri'Nalp  r2-SaBpla  Stat  r3-Strata  Stat  r4-Analyala  F5-Exlt 


Cha|Kar4  Ener Analysis 


Press  F2  to  continue  and  clear  the  help  window.  The  screen  returns  to  the 
strata  statistics.  Press  F4  to  see  the  results  of  the  error  analysis. 


I  EAU  GALLE  •  1981  -  STATION  20  | 

I  EKAOR  ANALYSIS  I 


STRATUM 

t 

VARIANCE 

«  N 

»  OPTIMUM 

1 

8S.3 

K.O 

52. 5 

2 

10.7 

36.0 

27.9 

3 

0.8 

K.O 

5.1 

4 

3.2 

32.0 

14.5 

VARIANCE 

NITH 

EXISTING 

DESIGN 

3.97E+01 

VARIANCE  NITH  OPTIMAL  DESIGN 


l.SSE-^Ol 


K  Fl-H*lp  r2-Sai«pl«  St«t  r3-Strit«  St«t  F4-Analysls  F5-Exlt  Si 


Press  FI  for  help. 


EAU  GALLE  •  1981  -  STATION  20  | 


ERROR  ANALYSIS 

STRATUM  %  VARIANCE  %  N 

8S.3  K.O 


Halp  -  Error  analyfls 


%  OPTIMUM 
52.S 


Tha  tVarianca  eolunn  glvas  tha  ralatlva  contribution  ot  aach  ttratua 
to  tha  ovarall  atratifiad  aaapla  varianea.  Tha  tN  eoluan  ahowa  how 
tha  aaaplaa  wara  diatributad  aaiong  cha  atrata.  Uain?  tha  obaarvad 
diatribution  o<  varianea  aaon9  atrata  <tha  tvarianca  eoluam).  arror 
analyaia  aup^aata  an  optimal  diatribution  et  aaaiplaa  aaionp  tha 
atrata  <tha  tOptiaiua  eoluan) .  Tha  raportad  'Varianea  with  optiaal 
daaivn*  ia  tha  arror  varianea  that  would  raault  if  tha  optiaal 
daalgn  waa  adopted  lor  future  aaaplln^  (if  eonditiona  do  not 
dramatically  ehanpa  over  tiae) . 


F2  -  Continue 


Fl-Ralp  F2-Saivle  Stat  F3-Strata  Stat  F4-Analyai8  F5-Exit  P 
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\  .  • 


Press  F2  to  continue  and  clear  the  help  window. 


I  lAU  GALLE  >  1981  -  STATION  20 


BMOR  MiUTSIS 


STRATUM 

8  VARIANCE 

t  N 

6  OPTIMUM 

1 

•S.3 

16.0 

S2.S 

2 

10.7 

36.0 

27.9 

3 

o.s 

16.0 

5.1 

4 

3.2 

32.0 

14.5 

VARIANCE  NITB  EXISTING 

DESIGN 

3.75E401 

VAMANCE  NITH  OPTIMAL  DESIGN 


l.SSE+Ol 


E  Pl-B€lp  P2-S»,’ipl«  St«t  r3-Strata  Jtat  P4-Analysls  FS-E*it 


At  any  time  during  the  program  you  can  switch  between  the  output  win^ 
dows.  Press  F2  to  return  to  the  sample  statistics  screen. 
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Press  F4  to  return  to  the  error  analysis  screen. 


The  results  of  the  error  analysis  indicate  that  the  error  variance  could 
be  reduced  to  less  than  SO  percent  (19.6/39.7  s  0.494)  of  its  observed . 
value  by  using  the  optimal  design.  The  optimal  design  consists  of  a 
redistribution  of  samples  to  place  more  samples  in  highly  variable  strata 
and  less  samples  in  strata  with  less  variability.  The  spring  stratum 
(stratum  1)  accounts  for  over  85  percent  of  the  observed  variance  (% 
Variance  column),  but  only  16  percent  (%  N  column)  of  the  samples  were 
allocated  to  this  stratum.  The  optimal  design  would  allocate  just  over  52 
percent  (%  Optimum  column)  of  the  samples  to  this  stratum.  The  winter 
stratum  (stratum  4)  accounts  for  only  3  percent  of  the  observed  variance, 
but  32  percent  of  the  sampling  effort  was  allocated  to  this  stratum.  The 
optimal  design  suggests  ttot  only  about  15  percent  of  the  samples  should 
be  dedicated  to  this  stratum. 


Chr^4  Error  Analysis 


Press  F5  to  exit. 


At  this  point  yod  may  choose  to  either  run  ERROR  on  another  data  set 
or  exit  from  the  program. 
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Example  Output  File 


ERKOR  AMRLYSIS 


KAU  OKUE  -  1961  •  STATION  20 


Title 


STMTiriEO  SAMPLE  STATISTICS 

MEAN  3.<2E*01 

VARIANCE  1.SSE402 

ERROR  variance  3.97E401 


Stttistics  for  the  entire  ttntified  umple 


STRATA  STATISTICS 


STRATUM  N  KEAN 


VARIANCE 


1  * 

2  9 

3  4 

4  S 


7.33E>01 

4.1SE*01 

3.19E*01 

1.39E'»01 


4.6Se*03 

3.42E+02 

4.S1E^01 

9.34e*01 


ERROR  VARIANCE  | 

i.2iE«03  1  —  Stttistics  for  etch 

3.SDE>01 
1.13E'f01 
1.17E+01 


oftbestrttt 


ERROR  ANALYSIS 


STRATUM 

t  VARIANCE 

t  H 

t  OPTIMUM 

1 

t5.3 

IS.O 

S2.S 

2 

10.7 

34.0 

27. t 

3 

O.S 

14.0 

S.l 

4 

3.2 

32.0 

14. S 

VARIANCE 

NITB  EXISTING  DESIGN 

3.97E401 

VARIANCE  NITH  OPTIMAL  DESIGN  l.SSE^Ol 


Error  anilysis 
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Cluster  analysis  is  a  multivariate  classiHcatioh  technique  that  may  be 
used  to  group  or  identify  similar  objects  or  entities.  In  a  data  analysis 
situation  (ra&er  than  a  sampling  design  evaluation),  cluster  analysis  may 
be  used  to  group  a  set  of  reservoirs  according  to  their  trophic  state  or  by 
the  composition  of  their  phytoplankton.  The  use  of  cluster  analysis  in  a 
typical  data  analysis  mode  can  be  found  in  Gaugush  (1986).  For  the  pur¬ 
poses  of  sampling  design  evaluation,  cluster  analysis  can  be  used  to  iden¬ 
tify  and  possibly  reduce  redundancies  in  the  sampling  design.  The  use  of 
cluster  uialysis  for  this  type  of  application  is  desc;ibed  more  completely 
in  Gaugush  (1987). 

In  the  evaluation  of  a  sampling  design,  cluster  analysis  can  be  used  to 
examine  the  quality  of  the  information  being  provided  by  elements  of  the 
sampling  design.  In  cluster  analysis  these  elements  are  referred  to  as  “en¬ 
tities”  and  may  be  sampling  stations,  dates,  and/or  the  strata  used  in  a 
stratified  sampling  design.  The  analysis  begins  with  each  entity  in  its  own 
cluster  and  proceeds  to  join  similar  clusters  until  all  of  the  entities  are  in  a 
single  cluster.  The  object,  when  used  to  evaluate  a  sampling  design,  is  to 
determine  if  all  of  the  elements  of  the  design  are  providing  independent  in¬ 
formation.  For  example,  assume  that  data  have  been  collected  for  twelve 
stations  in  a  reservoir  and  a  cluster  analysis  of  the  data  indicates  that  the 
data  fall  into  four  clusters  each  represented  by  three  stations.  This  im¬ 
plies  that  some  of  the  stations  are  redundant  (they  are  supplying  essential¬ 
ly  the  same  information).  If  the  sampling  program  were  to  be  continued 
(as  in  a  monitoring  program),  the  results  of  the  cluster  analysis  could  be 
used  to  reduce  sampling  effort  Sampling  only  1  of  the  3  stations  from 
each  cluster  would  result  in  the  use  of  4  stations  rather  than  12. 

The  CLUSTER  program  can  be  used  to  identify  redundancies  in  sam¬ 
pling  programs  and  suggest  ways  in  which  to  reduce  sampling  effort  in  fu¬ 
ture  studies.  CLUSTER  uses  one  of  three  clustering  methods  (average 
linkage,  centroid,  or  Ward’s  method)  to  cluster  the  data;  outputs  a  tabular 
“history”  of  the  clustering;  and  produces  a  dendrogram  of  the  clustering. 
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Data  Set  Preparation 

The  Cluster  Analysis  program  requires  that  input  data  sets  be  prepared 
prior  to  its  use  (i.e.,  data  input  during  the  program  i' not  available).  Data 
sets  can  be  prepared  with  most  text  editors  and  word  processing  software. 

The  data  sets  may  contain  only  ASCII  characters  and  none  of  the  special 
characters  used  by  most  word  processors  for  formatting.  If  you  use  a 
word  processor  to  generate  your  data  sets,  be  sure  to  save  the  Hies  in  DOS 
or  ASCII  format. 

Data  in  CLUSTER  input  files  are  organized  into  four  groups:  \ 

Group  1  -  title  ' 

Group  2  •  problem  size  identifiers 

Group  3  •  entity  names  \ 

Group  4  -  data  records  \ 

'  / 

An  example  data  set,  EG.CLS,  is  provided  on  the  SDS  distribution  dis*  y 

kette  and  is  shown  below: 

’  /  ' 

Dan  Group  I 

Data  Gtoiqi  2  .  ; 

\ 

\ 

DaaGioupJ  ' 


Data  Group  4 


n 


lAU  SU.1X 

5  3  - 

mio 
STMO 
STA30 
STASO 
STA60 
.06S  1.S07  44.12$ 
.078  1.S03  43.144 
.088  1.473  41.1SS 
.068  1.427  33.800 
.070  1.487  46.068 


_J 


n 


CLUSTER  does  not  require  strict  positioning  of  data  in  specific  columns, 
but  it  does  have  two  simple  requirements:  (a)  each  line  must  start  in 
column  1,  and  (b)  multiple  items  on  a  single  line  must  be  separated  by  one 
blank  space.  Data  Group  1  consists  of  a  single  line  specifying  a  title  for 
the  data  set  (maximum  of  60  characters).  Data  Group  2  is  a  single  line 
with  two  items.  The  first  is  the  number  of  entities  in  the  data  set,  and  the 
second  indicates  the  number  of  variables  to  be  used.  The  CLUSTER  pro* 
gram  can  handle  a  maximum  of  SO  entities  with  a  maximum  of  10  vari¬ 
ables.  Data  Group  3  provides  the  names  of  the  entities  (one  line  for  each 
of  the  entities  specified  in  Data  Group  2).  Each  name  can  have  u  maxi¬ 
mum  of  20  chai^ters.  In  the  example  data  set,  the  entities  are  water 
quality  sampling  stations  in  Eau  Galle  Reservoir.  Data  Group  4  lists  the 
data  for  the  variables  (one  line  for  each  entity  and  in  the  same  order)  to  be 
used  in  the  cluster  analysis.  In  the  example  data  set,  these  variables  are 
total  phosphorus,  total  nitrogen,  and  chlorophyll  a  concentrations  (from 
left  to  right). 

Suggestion:  use  an  extension  of  .CLS  for  CLUSTER  data  files.  Tliis  will 
distinguish  them  from  other  data  files. 
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Program  Execution 

To  run  the  Cluster  Analysis  program,  simply  type  “cluster"  at  the  DOS 
prompt.  Be  sitre  your  default  directory  (i.e.,  the  directory  that  you  are  in 
when  you  enter  the  above  command)  contains  all  of  the  files  provided  on 
the  Sampling  Design  Software  disk. 

After  the  above  command  is  entered,  the  program  will  prompt  you  for 
all  of  the  necessary  inputs.  Program  flow  is  as  follows: 

a.  Introductory  screen. 

b.  Prompt  for  input  flle  name. 

c.  Prompt  for  output  file  name. 

d.  Prompt  for  clustering  method. 

e.  View  output. 

/.  Exit  program. 

A  documented  session  presented  below  provides  a  more  complete  view 
of  program  flow. 


Error  Messages 


After  prompting  for  the  input  and  output  Ele  names,  CLUSTER  per¬ 
forms  an  error  check  on  the  input  data  set  If  the  data  set  specifies  either 
more  than  SO  entities  or  more  than  10  variables  in  Data  Grwp  2,  CLUSTER 
outputs  the  following: 


nUtOR  IN  INPUT  FILE 

KITBXa  NUMBER  OF  ENTITIES  SO  OR 
NUMBER  or  VARIABLES  10 

EDIT  INPUT  FILE  AND  BEQIN  AGAIN 


After  displaying  the  error  message  the  program  tei 


/  -  M  •  ' 

2 _ i'-l-i 
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CLUSTER  also  perfo.  ms  an  error  check  on  Data  Group  4.  If  the  stand¬ 
ard  deviation  of  any  of  the  variables  is  zero,  CLUSTER  outputs  the  follow 
ing: 


ERROR  IN  DATA 

STANDARD  DEVIATION  FOR  VARIABLE  j  IS  ZERO 

THIS  MEANS  THAT  VARIABLE  j  IS  THE  SAME  FOR 
ALL  ENTITIES  AND  NILL  SERVE  NO  PURPOSE  IN  THE 
CLUSTER  ANALYSIS  -  DELETE  THE  VARIABLE  FROM  THE 
INPUT  FILE  AND  BEGIN  AGAIN 

As  the  error  message  states,  a  variable  without  variance  (standard 
deviation  equal  to  zero)  does  not  add  information  to  the  cluster  analysis. 
After  displaying  the  error  message,  the  program  terminates. 


Documented  Session 

This  example  execution  of  CLUSTER  uses  the  EG.CLS  data  set 
provided  on  the  SDS  distribution  diskette.  These  data  w.ere  derived  from 
studies  conducted  on  Eau  Galle  Reservoir  in  west-central  Wisconsin.  The 
entities  arc  five  water  quality  stations  within  the  reservoir.  Stations  10 
and  SO  (STAIO  and  STASO)  are  littoral  stations  located  in  two  different 
coves.  Station  40  (STA40)  is  an  inlet  station.  Station  30  (STASO)  is  lo¬ 
cated  over  the  old  river  channel,  and  Station  20  (STA20)  is  located  over 
the  deepest  portion  of  the  pool.  These  stations  were  routinely  sampled, 
and  the  data  in  Group  4  of  EG.CLS  are  station  means  for  total  phos¬ 
phorus,  total  nitrogen,  and  chlorophyll  a  in  the  epilimnion  (0  -  3  m)  for 
one  growing  season  (April  •  September). 

The  object  of  the  analysis  is  to  determine  if  any  of  the  stations  are 
redundant.  If  two  or  more  stations  are  supplying  the  same  information, 
the  possibility  exists  for  reducing  the  number  of  stations.  Reducing  the 
number  of  stations  brings  about  the  obvious  reduction  in  costs  without 
reducing  the  information  derived  from  the  sampling  program. 
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Entering  the  command  “CLUSTER"  at  the  DOS  prompt  begins  the 
program. 


Cluster  Analysis 

Sampling  Design  Software  —  Version  2.0 

Developed  by 
Dr.  Robert  F.  Gatig^ush 
Enviromneatal  laboratory 
USAE  Watervniys  Experiment  Statbn 


(Press  any  key  to  continue...) 

Craatad  uaing  Turbo  Paacal.  Copwright  Borland  Intarnational  1981.  1989 


After  pressing  any  key,  the  program  prompts  for  the  input  Hie  name. 
For  this  session  enter  EG.CLS. 

i  I; 

Input  data  fila  naao?  ag.els  |i 

S:*  i': 

I  I 

P  Trovlda  tha  flla  naaa  of  your  data  fila.  Patha  ara  accaptad.  |i 

I  I 

!  I 


Ctiaptar  S  Clustar  Analysis 


CLUSTER  then  prompts  for  the  output  file  name.  Use  EG.OUT  for 
this  session. 


Output  data  fila  naaa?  eg .out 


I 


>:■ 

5;; 

t;; 

% 


fii 


i 


S: 

p 

Provide  a  file  nane  of  your  output  data  file.  Paths  are  accepted.  fi; 

|i 

! 


At  this  point  CLUSTER  prompts  for  the  method  to  be  used  in  the  clus¬ 
ter  analysis.  Help  windows  are  available  by  pressing  FI,  F2,  F3,  or  F4. 


"  ‘*“^1 


CLUSTERING  METHOD;  AVERAGE  LINKAGE  <A) 

CENTROID  (C) 
WARDS  <W) 


Enter  choice  of  aethod... 


n  -  General  help 

Specific  help;  F2  -  Avg  linkage  FS  -  Centroid  Fa  -  Hards 
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Press  FI  and  the  following  is  displayed. 


CLUSTERING  METHOD:  AVERAGE  LINKAGE  (A) 
CENTROID  <C) 
HARDS  IN) 


Halp  •  Cluataring  aathoda 


Ttaraa  aathods  (avaraga  llnkaga.  eantroid.  and  Wards)  ara  avallabla 
to  usa  to  cluatar  t):e  data.  Salact  a  owthod  by  anterlng  t)ia  lettar 
aaaoclatad  with  tha  daairad  aathod. 


F2  -  Continue 


I 

I 


Press  F2  to  continue,  and  the  help  window  is  removed. 


.2 


CLUSTERING  METHOD:  AVERAGE  LINKAGE  (A) 
CENTROID  (C) 
HARDS  IN) 

Intar  choice  at  a«tbod. .. 


ri  -  General  help 

Specific  help:  F2  -  Avg  lln)caga  F3  -  Centroid  F4  -  Wards 


li 

I 


ChaplarS  duster  Analysis 


/ 

y  - 


■  V. 

f, 

- 

•  r 


U  ; 


// 

•'/ 


Press  A  to  select  the  average  linkage  method. 


Fix*:  ag.oiit 


gClustar  Analysis 


EAU  GALLS 


ID 

mMBEA 

1 

2 

3 

4 

5 


EHTITY 

STAIO 

STA20 

STA30 

STA50 

STA60 


Average  Linkage  Method  used  for  clustering 


Fl-Help  F2-Exlt  Movement  Keys:  Home,  End,  PgUp,  PgDn,  Up  and  Down  Arrows 


\ 

X, 


\ 


At  this  point  the  cluster  analysis  is  complete  and  you  can  view  your  out¬ 
put  file  (in  this  case  EG.OUT  as  indicated  in  the  iirst  line).  The  cursor 
movement  keys  (Home,  End,  Page  Up,  Page  Down,  up  arrow,  and  down 
arrow  as  indicated  on  the  last  line)  allow  you  to  browse  through  the  output 
file.  Press  Page  Down. 


Pisagii 


Fii«]  •g.out 


p 

p  stage 

Clusters  Joined 

Distance 

I  ' 

1  S 

6.0S0E-01 

e  2 

1  3 

1.21»Et00 

g  3 

1  2 

3.191E400 

P  ^ 

1  4 

f.OOOElOO 

^  The  distances  are  segmented  Into  the  following 

p  classes 

for  the  Linear  dendrogram 

1  CLASS 

LOWER  BOUND 

UPPER  BOUND 

1  1 

6.0SOE-01 

S.237E-01 

I  2 

8.237E-01 

1.039E400 

g  3 

1.039Ea00 

1.25SE'f00  ' 

g  4 

1.255E«00 

1.471E'»00 

‘ 

1.471E400 

l.CSSE-^OO 

h  F2«Exit  MovpMnt  Keys:  Hoap*  End*  PoUps  PoDnr  Up  and  Down  Arrows 
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Again  the  display  moves  20  lines  down.  Press  Home. 
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ClMplarS  Ouster Analysia 


The  display  moves  up  three  lines.  The  other  movement  keys  operate  in 
a  similar  manner.  Press  FI  for  help. 

rii«i  *9. out  _ 


3.796E«00 

4.160E400 

4.559E400 

4.996E400 

S.47SE400 


4.1<0E400 

4.SS9E400 

4.996E*00 

S.47SE'^00 

«.000E«00 


I  Holp  -  Cluatar  analytla  output  g 

I  Short  daaerlptlont  of  varlout  portions  of  tho  output  aro  avallabla.  1 

I  rs  *  Entity  and  10  nuabara  | 

I  r4  -  Staqaa  of  eluataring  24  25  g 

I  P 

I  FS  •  Olataneas  g 

I  rs  -  Oandrograa  ■'[ 

I  r7  -  Llnaar  va.  gaoaatrle  sealaa  for  tha  dandrooraa  * 

i  fi  -  Contlnua  J  24  25  g 

I  ri-Halp  r2-Exlt  Hovaaant  Kays;  HOM,  End,  PoUp,  PgDn,  Up  and  Down  ArroM 


A  help  menu  window  is  displayed  over  the  output  Hie.  Press  F3. 

rilai  ag.out 

21  3.796E';^00  4.1601400  l| 

22  4.160E400  4.SS9E400  g 

■iXi™  Halp  -  Entity  listing  i  l  iiii  . . . . . . .  | 

Thla  aaetien  llata  tha  ID  nuabara  that  hava  baan  asalgnad  to  tho  | 

antltlaa  In  tha  data  sat.  Entltiaa  can  ba  stations,  dataa,  dapths,  g 

roaarvolrs,  ate.  Thla  listing  will  ba  naeassary  to  Intarprat  tha  g 

dandrograa.  g 


r4  •  Stagos  of  elustorlng 
rs  •  Oistaaeoa 
rs  -  Dandrograa 


r7  -  Llnaar  va.  gaoaatrle  sealaa  for  tha  don^ograa 

T2  -  COntlnuO  • 


r2  -  Contlnua 

24  2! 

tha  don^ograa  I 

n  -  Contlnua  jg  jj 


Fl-Holp  r2-Exlt  Hovaaant  Kays:  Hoaa,  End,  PgUp,  PoDn,  Up  and  Down  Arrows  g 
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A  second  help  window  appears  describing  the  association  between  ID 
numbers  and  the  entity  names  in  the  data  set.  Pressing  F2  (Continue) 
removes  both  help  screens  and  restores  the  output  .«creen.  Press  F2  to  con¬ 
tinue  followed  by  FI  for  the  help  menu,  and  then  press  F4  for  help  on  the 
clustering  stages. 

p;  Fii«:  •g.out  ^ 

I  21  3.1i6Z*00  4.160E+00 

I  22  4.160E4'00  4.559E+00  g 

i  B«lp  -  Clustarlng  atagas  H 

i  Thla  aactlon  of  tha  output  provldaa  a  tabular  dlaplay  of  the  data  g 

J  uaed  to  develop  tha  dendxograa.  At  each  ataga  of  tha  eluataring,  two  H 

I  eluaters  are  joined  (ahown  in  the  'Cluatara  Joined*  column)  to  form  g 

a  new  cluater.  Tha  *Dlatanca*  eoluaui  provide  a  maeaura  of  tha  g 

I  relative  aimilarity  of  tha  awabara  of  tha  cluater.  Tha  amallar  tha 
dlatanca,  tha  greater  the  aimilarity. 


F2  -  Continue 


I 

24  25  I 


FS  -  Oiatancaa 
rs  -  Dendrogram 

FT  -  Linear  va.  geometric  acalea  for  tha  dendrogram 

F2  -  Continue  « 


24  25  p 

i 


Fl-Help  F2-Exit  Movement  Keyat  Home.  End,  PgUp,  PgOn,  Up  end  Down  Arrowa  P 


Press  F2  to  continue  followed  by  FI  for  the  help  menu  and  FS  for  help 
on  the  distance  classes. 


File:  eg. out 

21  3.79eE*00 

22  a.ieoE-^oo 

"»  Help  -  Dlatancaa  •- 


4.160E'f00 

4.559E400 


The  range  of  relative  diatance  (preaented  in  the  output  deacribing 
the  cluatering  atagaa)  ia  divided  into  25  diacrate  claaaea.  Thia  ia 
nacaaaary  to  accommodate  tha  tachniguea  uaed  to  develop  the  graphical 
depiction  of  tha  dendrogram. 


F4  -  Stagea  of  cluatering 
F5  -  Oiatancaa 
F6  -  Dendrogram 

FT  -  Linear  va.  geometric  acalea  for  tha  dendrogram 

—  r2  -  Continue  ■ 


EFl-Halp  F2-Exit  Movement  Keya:  Home,  End,  PgUp,  PgDn,  tto  and  Down  Arrowa  ii: 


Chaptar  S  CUiatar  Analysis 


Press  F2  to  continue  followed  by  FI  for  the  help  msnu  and  F6  for  the 
dendrogram  help  window. 


g  rii«i  *9. out  _ 


21 

22 

Balp 


3.796E+00 
4.160E'»00 
Oondrograa  ■ 


I.IEOE+OO 

4.5S9E'f00 


Tho  graphical  dlaplay  fr'»  a  cluatar  analyaia  la  cafarrad  to  aa  a 
dandrograa  baeauaa  of  It^  traa~llka  appaaranea.  At  tha  'trunk*/  all 
of  tho  antltloa  hava  boon  jolnad  into  a  alngla  cluatar  (at  tho  far 
right  of  Cha  dondrograo) .  At  tha  far  loft#  oach  of  tha  *branchaa* 
raproaanta  a  alngla  ontlty  and  oach  cluatar  haa  only  ona  antity. 
Moving  from  loft  to  right/  cluatara  ara  Jolnad  until  all  of  tha 
antltiaa  hava  boon  eoaibinad  into  a  alngla  cluatar. 

Tha  ID  valuaa  llatad  along  tha  loft  8«rgin  corraapond  to  thoaa 
aaslgnad  to  tha  antltiaa  in  tha  data  aat.  Tha  valuaa  (1  -25)  along 
tha  top  and  botton  of  tha  dandrogran  corraapond  to  tha  crltarlon 
valuaa  and  provlda  a  ralativa  awaaura  of  tha  siBllarlty  batwoen 
■aaUMra  of  a  cluatar.  Cluatara  at  tha  loft  ara  conpoaod  of  aora 
alailar  Baabari  than  cluatara  at  tha  right. 


F2  -  Contlnua 


24  25  i 


24  25 


Fl-Halp  F2-Exlt  Movanant  Kayas  Ho«a,  End,  PgUp,  PgDn,  Up  and  Down  Arrows  g 


Press  P2  to  continue  followed  by  FI  for  the  help  menu  and  F7  for  help 
on  the  scales  used  for  depicting  the  dendrogram. 


F^as  og.out  ,  || 


21  3.79CE400  4.1«0E40a 

22  4.1S0E+00  4.S59E'»00 

Balp  -  Unoar  vs.  gaoaatrie  tealos 


Dondrograas  ara  output  using  both  a  llnaar  and  goosMtrle  aeala  for 
tha  ralativa  dlstancaa  batwoan  awabara  of  a  cluatar.  This  is  dona 
baeauaa  if  tho  range  of  ralativa  dlatancaa  is  vary  large,  the  plot 
algorltha  gats  "confused*  whan  drawing  tha  loft  side  (where  tha 
relative  dlstancaa  ara  at  a  alniauai)  of  tha  dondrograa  using  a 
llnaar  scale.  Nhan  tha  range  of  relative  distances  is  large  and  a 
linear  scale  is  used,  there  is  too  such  detail  on  tha  loft  side  of 
tha  dondrograa  for  tha  algorltha  to  deal  with. 

Khan  tha  distance  range  is  large  (>  than  two  orders  of  aagnltudo) 
tha  dandrocraa  plotted  on  a  gaoaatrie  scale  will  provide  a  batter 
representation  of  the  eluataring. 


F2  -  Continue 
F2  •  Continue 


24  25 


24  25 


FI -Balp  F2-Exit  Movaaant  Keysi  Boao,  End,  P< 

. 


PgDn,  Up  and  Down  Arrows  £ 


Press  F2  to  continue  and  F2  again  to  exit  the  program. 

Using  the  dendrogram  (the  entire  output  flle  is  presented  in  the  next 
section)  one  can  see  that  the  two  littoral  stations  (ID  numbers  1  and  5)  are 
very  similar  and  are  clustered  together  in  the  first  stage.  The  inlet  station 
(ID  number  4)  is  very  different  from  all  of  the  other  stations  and  is  only 
grouped  with  the  rest  at  the  last  stage.  With  this  information  it  may  be 
possible  to  reduce  sampling  effort  at  this  reservoir  by  sampling  only  one 
of  the  two  littoral  stations  currently  being  sampled. 


/ 
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Example  Output  File 

Cluster  Analysis 

EAti  CAiXE - Title  provide  in  input  data  set 


ID 

HUMBER 


1  STAID 

2  STA20 

3  STA30 

4  STASO 

5  STASO 


ED  numbers  Associated  with  entity  names 


Avaraga  Linkage  Hathod  used  Tor  clustering 

I 


Method  used 


Staga  Clusters  Joined 

1  15 

2  13 

3  1  2 

4  14 


Distance 

S.OaOE-01 

1.21SE400 

3.1S1E400 

S.000E400 


"Histoiy”  of  the  clustering 


The  distances  arn  aegnentad  Into  the  Tollowlng 
classes  Tor  the  Linear  dendrograai 


ss 

LOWER  BOUND 

UPPER  BOUND 

1 

6.080E-01 

8.237E-01 

2 

6.237E-01 

1.039E400 

3 

1.039E400 

1.235E400 

4 

1.255E400 

1.471E>00 

5 

1.471E400 

1.686E400 

6 

i.sesEioo 

1.902E*00 

7 

1.9a2E+00 

2.118E400 

S 

2.118E«00 

2.333E400 

i 

2.333E400 

2.549E400 

10 

2.549C400 

2.765E400 

11 

2.763E400 

2.98ie400 

12 

2.961E+00 

3.196E400 

13 

3.196E'>00 

3.412E+00 

14 

3.412E+00 

3.628E«00 

15 

3.628E+00 

3.843E4aO 

16 

3.843Et00 

4.059E^00 

17 

a.OBSE-^OO 

4.275E400 

18 

4.275E400 

4.490E400 

19 

4.490E400 

4.706E+00 

20 

4.706E400 

4.922E«00 

21 

4.922E400 

5.137E+00 

22 

5.137E+00 

5.353E+00 

23 

S.353E400 

i.Sf9t*00 

24 

5.569E400 

5.784E400 

25 

5.784E400 

6.000E+00 

j  The  nnge  in  distance  between  the  last  stage  and  the 

I -  first  stage  of  the  clustering  is  divided  into  25  equal 

I  classes  for  displaying  the  dendrognm. 
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Dendtogrun  displayed  using  a  linear  scale 


Llaaar  Seals 

ID  1  2  3  4  5  6  7  8  9  10  II  12  13  14  IS  16  11  18  19  20  21  22  23  24  25 

1 
5 

3 
2 

4 

ID  1  2  3  45  6  1  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25 

Unatr  Seals 


The  distaneai  are  ssgBsntsd  into  the  following 
clasaas  for  the  Gsomattie  dandrogsan 


CLASS 

LONER  BOUND 

UPPER  BOUND 

1 

6.0B0E-01 

6.663E-01 

2 

6.663E-01 

7.302E>01 

3 

7.302E-01 

8.003E-01 

4 

8.003E-01 

8.170E-01 

5 

8.7T0E*01 

9.611E-01 

6 

9.611E-01 

1.0S3E400 

7 

1.053E+00 

1.1S4E400 

8 

l.lSBE-fOO 

1.265E>00 

9 

1.26SE400 

1.386E400 

10 

1.386E+00 

I.SISE^OO 

11 

1.519E>00 

1.66SE+00 

12 

1.665E400 

1.82SE+00 

13 

1.82SE+00 

2.000E+00 

14 

2.000E400 

2.191E400 

IS 

2.191E400 

2.401E400 

16 

2.401E400 

2.632E400 

17 

2.632E400 

2.884E400 

18 

2.884E400 

3.161E400 

19 

3.161E400 

3.464E400 

20 

3.464E400 

3.796E400 

21 

3.796E400 

4.160E+00 

22 

4.160E400 

B.SSSE-^OO 

23 

4.SS9E400 

B.BSSE-^OO 

24 

4.996E'»00 

S.47SE400 

25 

S.47SE+00 

6.000E400 

The  lange  in  distance  between  the  last  suge  and  the  first 
stage  of  die  clustering  is  divided  into  2S  classes  using  a 
geometric  scale. 


XD  1  2 


5 


I  '  Dendrogram  displayed  using  a  geometric  scale 

I 

Gaoastrie  Seals 

6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25 


ID  1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25 

CaoaMtrle  Seale 


» 
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